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NONTECHNICAL  SUMMARY 

A  Markovian  Decision  Process  is  a  process  which  is  observed  at  distinct 
time  points  to  be  in  sane  state.  After  observing  the  state  of  the  system 
an  action  is  chosen  -  corresponding  to  the  action  (and  the  present  state) 
a  cost  is  incurred  and  the  transition  probabilities  for  the  next  state  are 
determined.  A  policy  is  any  rule  for  choosing  actions.  Corresponding 
to  each  policy  there  is  an  expected  long  run  average  cost  per  unit  time. 

This  paper  is  concerned  with  finding  an  optimal  policy  -  i.e.  one  whose 
associated  average  cost  is  as  small  as  possible. 

For  example  we  might  have  a  machine  which  deteriorates  with  time. 

The  state  of  the  system  could  be  the  condition  of  the  machine,  and  the 
possible  actions  could  be  either  to  replace  the  machine  or  not. 

Associated  with  each  state  there  would  be  an  operating  cost.  Thus  a 
policy  is  a  rule  for  determining  when  to  replace  the  machine  and  an  optimal 
one  is  one  which  minimizes  the  long  run  average  cost. 

In  this  paper  we  let  the  state  space  be  countable  and  present 
sufficient  conditions  for  the  existence  of  an  optimal  policy  and  for  it  to 
be  of  simple  form.  This  form  -  called  stationary  deterministic  -  is  of  the 
form  of  a  function  from  the  state  space  to  the  action  space.  For  example 
in  the  machine  problem  a  stationary  deterministic  policy  would  replace 
whenever  the  machine  is  in  a  certain  specified  class  of  states. 

In  a  special  case  the  average  cost  criterion  is  shown  to  be 
equivalent  to  the  discounted  cost  criterion.  This  latter  criterion 
has  been  extensively  studied.  Under  certain  conditions  the  optimal 
discounted  cost  policies  are  shown  to  be  almost  optimal  for  the  average 
cost  criterion. 
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It  is  also  shown  that  when  a  replacement  action  exists  (as  in 
the  machine  problem)  then  there  always  exists  an  optimal  policy  and  the 
form  of  this  policy  is  given.  Hie  final  section  gives  a  counterexample 
which  shows  that  the  optimal  rule  cannot  always  be  taken  to  be  of  the 
stationary  form  where  a  stationary  policy  is  one  which  at  each  state 
the  action  may  be  chosen  according  to  some  fixed  randomization  scheme. 
For  instance  in  the  machine  problem  a  stationary  policy  is  one  which  for 
each  state  gives  a  probability  for  replacing  the  machine. 


NON- DISCOUNTED  DENUMERABLE  MARKOVIAN  DECISION  MODELS 


Sheldon  M.  Ross 


Introduction 

We  are  concerned  with  a  process  which  is  observed  at  times 
t  ■  0,1,2,...  to  be  in  one  of  a  possible  number  of  states.  We  let  I 
(assumed  denumerable)  denote  the  number  of  possible  states.  If  at  time 
t  the  system  is  observed  in  state  1  then  one  of  possible  actions  must 
be  taken.  Unless  otherwise  noted  we  shall  assume  throughout  that  <  « 
for  all  i. 

If  the  system  is  in  state  1  at  time  t  and  action  K  is  chosen  then 
two  things  occur. 

(i)  We  incur  an  expected  cost  C(i,K)  and 
(ii)  P{Xt+1  =  j  I  X^Aq,  . . .  Xt  =  i,At  -  K)  -  P(i,J:K) 

where  fxr^o  denotes  the  sequence  of  Btates  and 


t+1 

r=0 


the  sequence  of  decisions  up  to  time  t  +  1. 


Thus  both  the  costs  and  the  transition  probabilities  are  functions 

only  of  the  last  state  and  the  subsequently  made  decision.  It  is  assumed 

that  both  the  expected  costs  C(i,K)  and  the  transition  probabilities 

P(i,j:K)  are  known.  Furthermore  it  is  assumed  that  the  expected  costs 

are  bounded  and  we  let  M  be  such  that  |c(i,K)|  <  M  for  all  i,K. 

A  rule  or  policy  R  for  controlling  the  system  is  a  set  of  functions 

{Dj^Xq^q,  ...  Xt)}  At  satisfying  0  <  ...  Xt)  <  1  K  -  0,1  ... 

K«1 
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h 

At 

and  2  D 


for  every  history  XoA)'  *  *  ,Xt  *  “  0;1' '  *  * 

The  interpretation  being:  if  at  tine  t  we  have  observed  the  history 
XqjAq,.  .  .Xt  then  action  K  is  chosen  with  probability  Dk(Xq, . .  .Xt). 

We  say  that  a  rule  R  is  stationary  if  D^(Xq,Aq, . . .X^  -  i)  »  ^ 

independent  of  Xq,Aq,  . .  •At_1  and  t.  We  say  that  a  rule  R  is  stationary 
deterministic  if  it  is  stationary  and  also  ^  ■  0,  or  1.  Thus  the 
stationary  deterministic  rules  are  those  non-randomized  rules  whose  actions 
at  t  Just  depend  on  the  state  at  tine  t.  We  denote  by  C"  the  class  of 
stationary  deterministic  rules. 

Following  Derman  [2]  the  process  ((Xt,Aj.)  t  -  0,1,2,...}  will  be 
called  a  Markovian  Decision  Process. 

Two  possible  measures  of  effectiveness  of  a  rule  governing  a 
Markovian  Decision  Process  are  the  expected  total  discounted  cost  and 
secondly  the  expected  average  cost  per  unit  time.  The  first  assumes  a 
discount  factor  Pe(0,l)  and  for  a  starting  state  XQ  ■  i  the  objective  is 
to  minimize 


% 


The  second  criteria  tries  to  minimize  for  a  given  Xq  ■  i 

n  °<W 

<p(i,R)  ■  lim  sup  E_  L  ■■ 

n  -* »  t=0  n  +  1 


Since  costs  are  bounded  and  adding  a  constant  to  all  the  costs  C(l,K) 
will  affect  all  rule  identically  in  both  criteria  we  may  without  loss  of 
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generality  assume  that  costs  are  non -negative. 

Blackwell  In  [1]  has  shown  for  the  discounted  case  that  if  <  « 
and  (C(i,K)}  bounded  then  there  exists  a  stationary  deterministic  optimal 
rule.  We  shall  be  mainly  concerned  in  this  paper  with  the  average  cost 
criterion.  The  first  results  for  the  average  cost  criteria  which  did  not 
assume  a  finite  state  space  were  given  by  Taylor  [9]  who  worked  with  a 
replacement  model.  A  replacement  model  is  one  in  which  there  is  a  distinguished 

state  0  and  action  aQ  such  that  Xq  ■  0  and  P(i,J tag)  ■  otherwise' 

Taylor  showed  that  in  the  finite  action  replacement  model  if  one  can 
restrict  attention  to  those  rules  whose  expected  time  between  replacements 
is  uniformly  bounded  then  there  exists  a  stationary  deterministic  optimal 
rule  and  it  is  determined  frcm  a  functional  equation.  Taylor's  method  was 
to  treat  the  average  cost  problem  via  the  known  results  of  the  discounted 
cost  problem. 

Derman  [4]  has  recently  dealt  with  the  countable  state,  finite  (for 

each  state)  action  general  Markovian  model.  He  treats  the  problem  without 

u.iing  the  results  for  the  discounted  problem  and  gives  sufficient  conditions 

for  the  existence  of  a  stationary  deterministic  optimal  rule.  Unfortunately 

this  condition  -  the  existence  of  a  bounded  solution  of  the  functional 

equation  g  +  f(i)  =  min  (C(i,K)  +  £  P(i,J:K)  f ( J ) )  -  cannot  be  checked 

K  J 

directly.  Derman' s  paper  [4]  however,  in  conjunction  with  a  later  Joint 
paper  [5]  of  Derman  and  Veinott  show  that  a  sufficient  condition  for  the 
above  is  that 

(i)  for  each  rule  ReC"  the  resulting  Markov  chain  is  positive  recurrent,  and 

(ii)  there  exists  some  state  (say  0)  and  a  constant  T  <  »  such  that 
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M10(R)  <  T  for  all  1,  and  all  RcC"  where  MiQ(R)  denotes  the  mean 
recurrence  time  from  state  i  to  state  0  when  using  rule  R.  [Note  that 
for  any  rule  RcC”  the  resulting  sequence  of  states  forms  a  Markov  chain]. 

In  the  first  section  of  this  paper,  by  following  the  approach  of  Taylor, 
we  give  a  somewhat  simpler  proof  of  Derman's  results.  Also  our  sufficient 
conditions  will  be  somewhat  weaker:  we  won't  require  condition  (i)  and 
won't  require  that  M^q(R)  <  T  for  all  rules  R.  We  also  show  the 
connection  between  the  average  cost  optimal  rule  and  the  optimal  discounted 
cost  rules,  -  speaking  loosely  the  former  is  a  limit  point  of  the  latter  rules. 

Hie  second  section  shows  how,  in  a  special  case,  the  average  cost  case 
can  be  recuced  to  the  discounted  cost  case. 

The  third  section  deals  with  e  -optimal  rules  and  a  sufficient  conditions 
is  given  for  the  opltmal  discounted  rules  to  be  e -optimal. 

Hie  fourth  section  deals  with  the  Replacement  Problem  and  it  is  shown 
that  an  optimal  rule  always  exists  but  it  may  not  be  of  the  stationary 
deterministic  type. 

Hie  fifth  section  given  an  example  of  an  optimal  nonstationary  rule 
which  is  better  than  any  stationary  rule. 

1.  On  the  existence  of  a  stationary  deterministic  optimal  rule 

We  shall  need  the  following  result  given  by  Blackwell  [1]: 

If  <  »  and  C(i,K)  <M  for  all  i,K  then  under  the  0-discounted  criteria 

with  0  <  0  <  1,  there  exists  a  stationary  deterministic  rule  such  that 

<l'(i,P,Rp)  ■  min  ♦(i,0,R)  for  all  id.  Furthermore  {^(i,P,Rp),  id)  is 

the  unique  solution  to  (l)  'l'(i,P,Rp)  =  “in  (C(i,K)  +02  P(i,J:K)  ^(j,P,Rp))  id 

K  J 
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and  any  stationary  deterministic  rule  which  when  in  state  i  selects  an 
action  which  minimizes  the  right  side  of  (l)  is  optimal. 

Following  Taylor,  for  any  Pe(0,l)  i,jel 

(2)  Let  fp(i,j)  -  *(i,Mp) 

One  has  by  simple  manipulations  that 

(3)  gp(j)  +  -  min  (C(i,K)  +  P  L  P(i,s:K)  fp(s,j)} 

K  8 

where  gp(j)  =  (l  -  P) 

Note  that  |gp(j)|  <  M  for  all  P,j 

We  need  the  following  Assumption 

♦Assumption:  For  some  sequence  P  -» l"  there  exists  a 

constant  N  <  »  such  that 

|fB  (i,j)|  <  N  for  all  r  -  1,2,...  all  l,jel. 

pr 

Theorem  1.1:  If  As  sumpt  ion  (♦)  holds  then  there  exists  a  bounded  solution 

to  the  functional  equation 

(It)  g  +  f(l)  *  min  (C(i,K)  +  L  P(i,J:K)  f(j))  ie  I 

K  J 

Proof:  Fix  some  state  s.  By  Assumption  (*)  fa  (i,s )  is  uniformly  bounded 

r 

for  r  »  1,2,...,  and  all  ie  I.  Since  I  is  denumerable  we  can  get  a 
00 

subsequence  {P  ,)  such  that  f6  (i,s)  -*  f (i )  for  all  i. 
r  r'«  1  r' 

Since  gp(s)  is  bounded  for  all  P,  we  can  also  require  that 

gp  (s)  -»g  asP  ,  -*1. 
r ' 

. *.  by  (3)  and  the  bounded  convergence  theorem  we  have  that 

g  +  f(i)  -  min  (C(i,K)  +  L  P(i,J:K)  f ( J  ) )  QED 

K  J 
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Remark:  If  is  an  Increasing  function  of  1  for  each  0, 

then  f(l)  is  an  Increasing  function  if  i. 

A  special  version  of  the  next  theorem  was  originally  proven  by 
Taylor  [9].  Derman  [4]  proved  it  under  the  assumption  that  <  * 
for  all  i;  later  in  a  joint  paper  with  Lieberman  [3]  a  proof  not 
assuming  this  was  given. 

We  shall  give  here  a  simple  proof  which  follows  a  technique  used 
in  [91. 

Theorem  1.2:  If  there  exists  a  bounded  solution  to  the  functional 
equation 

(5)  g  +  f(i)  -  min  {C(i,K)  +  £  P(i, j  :K)  f ( j )}  id 

K  J 

then  there  exists  a  stationary  deterministic  rule  R*  such  that 

g  *  <J>(i,R*)  »  min  9(i,R)  for  all  i 

R 

and  R*  is  the  rule  which,  for  each  i,  prescribes  an  action  which 
minimizes  the  right  side  of  (3). 

Proof :  for  any  rule  R 

n 

Er  (e  [f(xt)  -  yf^)  |  st_1)j)  =  0 

where  >*  state  at  time  t,  and 

=  (X0,Aq,  . . .  Xt_i>At_i)  =  history  up  to  time  t. 


;  v  * 


- ^m-ffUMCl 


now 


Vf<V 


St.l)-  f(J) 

J 

■  c(xt_l>At_l)  +  ^  ^*t-l#^ )  "  c(xt-l'^t-l^ 

J 

>  min  (C(Xt_1,K)  +  I  P(Xt.1,J :K)f(j)}  -  CfX^jA^) 

K  j 

>  6  +  *  C<Xt-l>At-l> 


with  equality  for  R*  since  R*  is  defined  to  take  the  minimizing  action 
n 

0  <^{E  f(Xt)  -  g  -  f(Xt_1)  +  C(Xt_1,At_1)} 


with  equality  for  R*. 

V<h>  Vf<X0)) 


n 


6  < 


Sr  ? 


n 


n 


n 


with  =  for  R* 


letting  n  -*  »  and  using  fact  that  f*(Xn )  is  bounded,  we  have  g  <  <p(R,X0) 
with  equality  for  R*  and  for  all  possible  values  of  XQ.  QED. 

Note:  the  above  proof  doesn't  make  use  of  the  fact  that  K.  <  «  or  that 

n  C(Xt,At) 

C(i,K)  is  bounded.  Also  it  shows  that  lim  EL,.  Z - —  s "  exists  and 

n  0  n  +  1 

equals  g,  and  that 

n  C(X  ,A  ) 

g  <  lim  inf  EL  Z  - -  for  every  rule  R. 

_  n  ^  0  n  +  1 

Urns,  in  this  case,  the  fact  that  the  average  cost  was  defined  by 
the  lim  sup  as  opposed  to  the  lim  inf  is  irrelevant. 

For  any  rule  ReC" 

Let  i(R)  *  action  R  chooses  when  in  state  i  -  i.e.  =  1 
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Def: 


For  rules  R^RcC"  we  say  that  Rn  converges  to  R(R^»  R,  or 

lim  R  ■  R)  If  for  each  i,  there  exists  such  that  i(Rn)  ■  i(R)  for 
n 

all  n  >  Ni- 

Note  that  any  countable  sequence  of  rules  RncC"  has  a  convergent 
subsequence . 

lhe  following  theorem  shows  the  relationship  between  R#  and  the 

P-discount  optimal  rules  Rp. 

Theorem  1.3:  If  Assumption (*)  holds  then 

(i)  for  some  subsequence  {P  , }°°  of  (P  }°° 

r  r'=l  r  r=l 

R*  ■  lim  Rp  . 

(ii)  if  R  ■  lim  Ro  where  {P  ,)  is  a  subsequence  of  {P  }°° 
r '  Pr '  r  r  r=l 

then  R  is  optimal  i.e.  <p(i,R)  =  g  for  all  id 

00 

Proof:  (i)  Let  (P  , }  be  the  subsequence  for  which 

r  r'-l 


it  is  easily  seen  from  the  definition  of  fp(j)  that  Rp  takes  the 

actions  which  minimize  C(i,K)  +  P  ,  Z  P(i,J:K)  fR  (j)  but  R*  takes 

r  J  r' 

the  actions  which  minimize  C(i,K)  +  Z  P(i,J:K)  f(j).  The  result 

J 

follows  since  <  ». 

00 

(ii)  For  any  sequence  {Pr ,  )p , _1  we  can  get  a  subsequence  {Pr„)“„ 

for  which  lim  fft  (i,s)  and  lim  gR  exist.  Denoting  the  limit  by  f(i) 
pr"  r" 

it  follows  from  theorem  2  that  any  rule  which  minimizes 

C(i,K)  +  Z  P(i,j:K)  f (J )  is  optimal.  But  Rq  minimizes 
J  r" 
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-  >r  \ 

'  <  ’  *> ^ 


:•  •  \ 


■v*' 


e  tt  .  \ 


t 


C(i.K)  +  P  lt  L  P(i,J:K)  ffi  (j)  and  Rg  -»  R 
r  j  Pr"  Pr" 

. ' .  R  is  optimal.  QED 

Thus  we  see  if  ^  <  00  for  all  i€l,  and  Assumption  (*)  holds  then 

there  exists  an  optimal  stationary  deterministic  rule  which  is  a  limit 

point  of  {Rg  :0  <  P  <  1}  and  any  rule  which  is  a  limit  point  of  (Rg  ) 
p  pr 

is  optimal. 

The  following  theorem  gives  a  sufficient  condition  for  Assumption  (*) 
to  hold. 

Theorem  1.4;  If  for  some  state  j,  and  sequence  P^  -*  1  there  is  a 
constant  N  <  00  such  that  (Rg)  <  N  for  all  i€l,  r  =>  1,2,... 

T 

then  Assumption  (*)  holds; 

where  (Rg  )  is  the  mean  recurrence  time  to  go  from  state  i  to  state  j 

when  using  the  P  -optimal  discount  rule  R  . 

r  Pr 

Proof:  Consider  the  fixed  rule  Rg  .  Suppose  the  process  starts 

1  r 

at  state  i.  Let  t  =  time  it  takes  to  first  get  to  state  J. 


now  *(i,er,y  -  (Cl)  ♦  (C2) 


where  ■  discounted  costs  incurred  before  one  gets  to 

t-1 

state  J  -  L  C(X  A)Pn 
n-0 

C 9  =  discounted  costs  incurred  after  one  gets  to 


state  J  =  L  C(Xn,&n)P> 
n=t 


♦( i,Pr,Rp  )  <  M  Et  +t(j,Pr,Rp  )  EL  (P^) 

r 

<  MN  +  *(j,Pr,Rp  ) 
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(recall  that  all  costs  are  positive  and  bounded  by  M) 


'•  )  <  mn  for  all  i >  v  =  1>^> •  •  • 


again  Ki,Pr,Rp  )  =  ER  (C3 )  +  ER  (Cg) 

r  p  P 

r  r 


>  ERa  tC2>  *  > 


*(J,Pr,R«  )  <  t(i,Pr,Re  )  +  [l  -  E  (Prt)]t(j,Pr,Rp  ) 
r  r  P„  r 


^(Jj^r>Rp  )  5 


1  -  P 


EfP^)  >  P^  >  P^  by  Jensen's  Inequality. 


t  1  '  Pr 

\  [1  -  Er  (P/)]*(j,Pp,Ha  )  <  - -  M  <  NM 

*P  r  r  Pr  l  -  P 


•  I  Kj,Pr,Rp  )  -  V(i,Pr,Rp  )|  <  MN  for  all  r  =  1,2,...  iel 

r  r 


lt(s,Pr,Rp  )  -  *r(i,Pr,Rp  )|  <  2MN  for  all  r  =  1,2,...  i,sel 


.'.  Assumption  (*)  holds.  QED 


Lemma  1 .  t> :  If  for  some  a  >  0  and  some  state  j 

P(i,J:K)  >  ol  for  all  iel,  Ke^ 
then  Assumption  (*)  holds  and  there  exists  a 
stationary  deterministic  optimal  rule  which  is 


'  u  'y. 


J 


•o’* 


Assumption:  sup  inf  P(i,j:K)  >  0 

jel  K,  K± 

iel 


Note  this  is  so  if  and  only  if  there  is  a  state  j  and  a  >  0  such  that 
P(i,j:K)  >  Q£  for  all  iel,  ffcKj .  For  the  sake  of  definiteness  denote  the 
state  j  for  which  the  above  holds  by  state  0.  By  Lemma  5  there  exists  a 
stationary  deterministic  optimal  rule  for  this  process. 

Consider  now  a  new  process  (the  prime  process)  with  identical  state 
and  action  spaces  but  with  transition  probabilities  now  given  by 


P'(i,j:K) 


j  /  0 


Denote  by  <r'(i,P,R)  the  total  expected  ^-discounted  costs  when  using 
rule  R  with  respect  to  the  new  (prime)  process. 

Note  that  any  rule  for  the  prime  proce^.  can  also  be  considered  as  a  rule 
for  the  original  process  and  vice  versa. 

The  fundamental  theorem  in  the  reduction  is  the  following: 
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Theorem  2.1:  For  any  stationary  rule  R 


9(0, R)  -  <*’(0,1  -  a,  r) 

ft'oof:  In  the  original  problem  we  shall  think  of  the  transitions  as 
taking  place  in  2  stages.  During  stage  1  a  coin  with  probability  a  of 
coming  up  heads  is  flipped.  If  heads  comes  up  then  the  process  goes  to 
state  0;  if  not  then  the  process  moves  to  the  next  state  according  to 
the  second  stage  transition  probabilities  which  are  the  transition 
probabilities  which  are  necessary  in  order  to  make  the  total  transition 
probability  what  it  should  be  -  i.e.  if  action  K  is  chosen  then  the  total 
probabilities  must  be  P(i , J :K) .  Note  that  the  above  is  legitimate  since 
P(i,0:K)  >  a  for  all  i,K.  Note  also  that  the  desired  second-stage 
transition  probabilities  are  exactly  the  transition  probabilities  of  the 
prime  problem. 

Define  a  cycle  as  the  time  between  successive  occurrences  of  heads. 

Let  T  ■  time  of  cycle. 

Then  it  Is  well  known  (follows  from  the  Strong  Law  of  Large  Numbers  and 
the  Bounded  Convergence  Theorem)  that  for  any  stationary  R 


T-l 

<p(0,R)  -  Ep  Z  C(Xt,At)/  EpT 
t*0 

*  h  (erXc<VV  It>  /v 


Z  a(i  -  a)1”1 
i=l 


i-1 

Z 

t=0 


ER[C(Xt,At)l  T  =  i]  /  1/a 


now  conditioning  on  T  =  i  means  that  the  transition  probabilities  used 
during  times  0,1,...  i  -  2  were  the  2nd  stage  transition  probabilities, 
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i.e.  the  transition  probabilities  of  the  prime  problem. 

for  t  <  i  -  1,  EptCfX^)  I  T  =  i]  =  EjJcfX^) 

t 

where  Ep  denotes  the  expected  cost  with  respect  to  the  prime  problem. 

oo  i-1 

9(0,  R)  =a  z  aci-a)1”1  z  eJ.  c(X.A) 

i=l  t=o  K  z  t 

00  '  00  i-1 

=  o:  Z  E_  C(X  )  Z  G(l-Cf)  since  everything  is 

t=0  ^  i=t+l  non -negative 

00 

=  a  z  e'  C(X.,A  )  (l-a)t 
t=o  R  1  fc 

=  a.'  '(o,l-a,R) 

QED 

Note  that  since  P(i,0:K)  >  a  for  all  i,K  it  follows  that  for  any 
stationary  rule  R 

<p(i,R)  =  9(0, R) 

and  since  a  stationary  deterministic  optimal  rule  does  exist  it  follows 
from  the  above  theorem  that  this  rule  is  precisely  the  optimal  1-Ci 
discount  rule  with  respect  to  the  prime  problem. 

Letting  V(i)  =  min  \  (i,l-a,R) 

R 

we  have  that 

V(i)  =  min  (C(i,K)  +  (l-a)  Z  p’(i,j:K)  V(j))  iel 
K  J 

or 

V(i)  =  min  (C(i,K)  +  Z  P(i,j:K)  V(j)  -  av(0)}  iel 
K  J 
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and  the  optimal  policy  Is  the  one  which  chooses  the  minimizing  actions. 
Let  B(I)  ■  Space  of  all  bounded  functions  on  I  then  defining  the 
operator  T:B(l)  -»B(l)  by 

(TO),  »  min  (C(i,K)  +  E  P(i, J  :K)  U(J )  -  “U(0))  id 
K  j 

Then  under  the  supremum  norm  ||u||  =  sup  |  U(l)  | 

id 

We  have  that  T  is  a  contraction  mapping  with  unique  fixed  point  V,  and 

V  can  be  found  by  the  simple  to  apply  method  of  successive  approximations 

i.e.  for  any  Uc  B(l),  lim  T*1  U  -  V. 

n 

Thus  if  our  assumption  holds  then  we  have  reduced  the  average-cost 
problem  to  a  discounted -cost  problem  and  any  of  the  well-known  methods 
of  successive  approximations  or  policy  improvements  -  see  [1]  for  details 
can  be  applied. 

Note  that  policy  improvements  for  this  1-a  discount  problem  are  by 
virtue  of  theorem  2.1  also  policy  improvements  for  the  original  average - 
cost  problem. 

3.  On  €-optlmal  Rules 

It  is  known  (see  [4])  that  even  under  the  conditions  that  ™ 
for  all  i,  and  C(i,K)  uniformly  bounded  that  there  need  not  exist  an 
optimal  rule  in  the  "average  cost"  sense;  Also  there  may  exist  an 
optimal  rule  but  there  may  be  no  stationary  deterministic  rule  which 
is  optimal. 

This  brings  up  the  question  whether  there  always  exist  e -optimal 
stationary  deterministic  rules.  We  say  that  ReC"  is  € -optimal  for 
state  i  if 
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<p(i,R)  <  g(i)  +  € 

where  g(i)  =  inf  cp(i,R) 

R 

We  say  that  ReC"  is  e -optimal  if  it  is  € -optimal  for  every  state  i. 

One  possible  source  of  e -optimal  stationary  deterministic  rules  are  the 
optimal  P -discount  rules  (RpiO  <  P  <  1}.  One  might  conjecture,  that 
for  any  state  i,  that  these  rules  are  € -optimal  for  state  1  in  the 
sense  that 

lim  inf  o(i,Rp)  ^  g(i) 

0  -»  1 


The  following  counter-example  shows  that  this  need  not  be  the  case. 

|*  (i.»«j)  J  =  0,1, ...i,  i  =  1,2,... 

Let  I  *  1 


K(i,J) 


j  «  0 

J  t  0 


The  costs  depend  only  on  the  state 


C((i,j),*)  = 


C(oo,. )  =  2 


j  =  0 

J  +  0 


K  =  1 
00 


the  transition  probabilities  are  as  follows 


P((i,0),{i  +  1,0):1) 
P((i,0),(i,l):2) 

P((i,J),(i,J  +  1):1) 

P((i,i),co:i) 

P(oo,oo;l) 


1 

1 

1  for  0<J<i 
1 
1 


15 


In  words,  when  in  state  (i ,0)  we  can  choose  to  go  to  state  (i  +  1,0) 
at  the  cost  of  1  unit  for  the  next  stage  or  we  can  elect  to  pay 
0  dollars  for  the  next  i  stages  and  2  units  for  every  stage  after  that. 
Let  XQ  =  (1,0) 

Let  Rq  «  rule  which  takes  action  1  at  all  states,  then  it  is  easy 
to  see  that 

<P((1,0),Rq)  =  1 

R  stationary  deterministic,  R  ji  RQ  =>  <p((l,0),R)  =  2 

We  now  show  that  RQ  is  not  a  P -optimal  rule  for  any  P(0  <  (3  <  l) 
and  thus  cp((l,0),Rp)  =  2  for  all  Pe(0,l) 

while  inf  cp  ((1,0), R)  =  1 

R 

now  *((1,0),P,R0)  «=  jig-  since  the  cost  at  each  stage  is  1. 

Let  Rn  =  rule  which  takes  action  2  at  state  (n,0)  and  action  1  elsewhere. 

n-1  2n-l  .  °°  . 

♦((1,0), P,R  )  =  L  l.P1  +  L  O.P1  +  I  2.P1 

i*0  i«n  i«2n 

1  -  Pn  +  2P2n 

m  ■  ■  i  ■' 

i  -  P 

now  for  n  large  2P2n  -  pn  <  0 

.*.  for  n  large  ♦  ((l,0),P,Rn)  <  j-  ]  g 

.".  for  each  P,  R^  ■£  Rq 

.'.  <p((l,0),  Rp)  =  2  for  all  P 

and  inf  cp((l,0),R)  =  1  QED 

R 
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Thus  it  is  not  necessarily  true  that  the  0-optimal  discount  rules  are 
e -optimal,  or  even  € -optical  for  a  specific  state.  We  shall  now  give 


sufficient  conditions  for  Rg  to  be 

(i)  e -optimal  for  a  particular  state  (for  0  near  l) 

(ii)  e -optimal  for  all  0  near  1 

Theorem  3.1:  If  for  some  sequence  0r  -♦  l",  there  exists  an  N  <  »,  such  that 

<KJ,er,Ro  )  -  >Ki,0r,Ro  )  <  N  for  all  j€l,  r  =  1,2,... 

r  ^r 

then  lim  <P(i,Ro  )  =  g(i)  =  inf  qp( i ,R ) 

0r-  1  r  R 

and  so,  for  0r  large,  Rg  is  c -optimal  for  state  i. 

r 

Proof:  (i)  Let  Vp(j)  =  t|>(j,0,Rg) 

then  V6(J)  =  min  (C(J,K)  +  0E  P(J,1:K)  Ve(l)} 

K  1  P 

and  Rg  takes  the  minimizing  actions 

«"  %  £  W  '  ERp[VXt>lSt-l))  -  0 

an,i  %  [Vxt>  I  ViJ  *  x  VJ> 

*  ^  P^Xt-l'^  ^t-l^  V0^  +  C^Xt-l,At-l^ 

J 

"c(xt-rAt-i)  +  (1_p)  E  p(xt-^J:At-i)v0(J) 

u 

-  ve<xt-i>  -  c<xt-i'At-i) 

+  (l-@)  I.  P(xt_;| ,  j  *^t-l )  VJ) 

J 

0  -  VW  •  Vx<°>) +  \  f  c(xt-i^t-i>  •  ^  h.  x  VV 

p  Pi  0  1 
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now  using  our  condition  we  have  that  Vg  (Xt)  <  Vg  (i)  +  N  for  all  t 

pr  pr 

?  '(h-iA-i*  S  d-VV0  *  (1'er)N  ■  s  \  l\‘V  ■  Y<V> 


P  '  r 
r 


M 


Letting  n  -»  ®  we  have  since  J  (Xn)  -  (XQ)  |  <  ^ 


that  <p(X0,Rp  )  -  Pr)  Vp  (i)  +  (1  -  Pr)N  for  any  XQ. 


now  for  any  rule  R 


(1  -  P)  Vp(i)  <  (1  -  P)  *(i,P,R) 


lim  (1  -  a  )  Vg  (i)  <  lim  (1  -  ar)  <.(i,Pr,R)  <  «p(i,R) 

r-»  *  pr  r-* 00 

where  the  second  inequality  follows  from  the  Tauberian  result  that 

1  n 

lim  (1  -  X)  £  a  X  <  lim  -  £  a,  [see  Titchmarsh  -  p.  227] 

X-*l"  n=0  n  n  n  1 


lira  <P(Xq,R  ) 
r-*  oo  Hr 

< 

<J>(i,R) 

for  any  R,  any  X( 

lim  9(Xq,R  ) 
r-»  oo  pr 

< 

s(i) 

for  any  XQ 

lim  <p(i,Rg  ) 
r-»  oo  r 

- 

g(i) 

QED 

Corollary  3.2:  If  for  sane  sequence  ar  -> l’,  there  exists  Ni  <  oo 

for  each  id  such  that  i|»(j,ar,Rg  )  -  <J/(i,ar,Rg  )  <  for  all  r,J , 

pr  pr 

then  (i)  lim  q>(i,Rg  )  =  g(i)  for  all  i,  and  the  convergence 

r-*  oo  pr 

is  uniform  in  i  -  and  thus  is  e -optimal  for  r  large 
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(ii)  g(i)  =  g(j)  =  g  for  all  i,J 

Proof:  We  first  prove  (ii) 

From  the  proof  of  the  previous  theorem  we  have  that 

lim  (p(Xq,R^  )  <  g(i)  for  any  XQ,  any  i,  and  also 

r  -> 00  r 

that  g(XQ)  =  lim  cp(XQ,Rp  )  for  any  XQ 

r-»  oo  r 

e(X0)  <  g(i)  for  any  i,XQ 

•  g(i)  =  g(j)  =  E  for  all  i,j 

(ii)  The  convergence  is  an  immediate  result  of  theorem  3.1. 

To  show  uniformity  -  fix  some  state  1q 

The  previous  theorem  yields  that 

lim  (1  -  P  )  Vp  (iQ)  <  g(iQ) 
r  r 

and  q>( J )  <  (l  -  P  )  Vp  (iQ)  +  N  (1  -  Pr)  for  any  state  J 

r  r  0 

for  e  >  0,  let  r  be  such  that  r  >  r  implies 

(1)  (1  -  P  )  V  (iQ)  <  g(iQ)  +  e/2  and  also 

p  r 

(ii)  (l-P)N  <  c/2 

0 

r  >  r  =>  <p(j,Ro  )  <  g(iQ)  +  e/2  +  e/2  =  g(iQ)  +e  for  any  j 

r 

but  g(iQ)  -  g  and  so  convergence  is  uniform.  QED 

Note  that  the  condition  in  the  above  corollary  is  weaker  than  the 

condition  that  -j  N  <  00  such  that  |  )  -y(i,Pr,Rp  )  I  <  N 

r  r 

for  all  r,i,j.  This  latter  condition  is  Assumption  (*). 
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From  Theorem  1.4  we  thus  have 

Theorem  3-3:  If  for  some  state  J,  there  is  N  <  ® 

such  that  M. .(R^  )  <  N  for  all  r  =  1,2,...  all  i 
“  r 

then  the  condition  for  corollary  3.2  is  satisfied  and 

thus  g  =  lim  cp(i,Rp  )  uniformly  in  i. 
r-* «  r 

Proof:  see  proof  of  Theorem  1.4. 

Putting  corollary  3-2  together  with  Theorems  1.1,  1.2  and  1.3 
we  have 

Theorem  3.2:  If  Assumption  (*)  holds  then  there  exists  a  stationary 

deterministic  optimal  rule  which  is  a  limit  point  of 
the  optimal  -discount  rules,  and  for  any  €  the 
^-discount  rules  are  e-optimal  for  r  large. 

4.  Replacement  Process 

Definition:  A  Markovian  Replacement  Process  is  a  Markovian  Decision 
Process  with  a  distinguished  state  -  call  it  state  0  -  and  a 
distinguished  action  -  call  it  -  such  that 


(ii)  P(i,j:a0)  =  \ 

^0  otherwise 

Let  g  =  inf  cp(0,R)  since  }'  =  0  we  shall  write  cp(R)  for  cp( 0, R) 

R  J 

Let  Rp  be  the  P-optimal  discount  rule. 

As  an  immediate  consequence  of  theorem  3*1  we  have 


20 


r  ne**** 


Theorem  4.1:  In  the  replacement  process 

ll;ii  f,0V )  =  T 

P-* 

Proof:  *(i,P,R6)  =  min  (C(i,K)  +  fiF  P(i,J:K)  V(j,r\.R,)} 

K  ,) 

<  C(i,a0)  +  P|r(0,p,R,,) 

<  M  +  ♦(0,r-,K.)  for  all  i,  all  f 

and  co  the  recult  lollowc  from  theorem  j$.l.  QED 
[Recall  that  since  C(i,K)  is  assumed  oounueu  we  can  thus  also  assume 
without  less  of  generality  that  curt:;  are  nun-negative  and 
t(0,(i,Rp)  >  0] 

The  following  corollary  is  immediate 

Corollary  4.2:  (i)  Tliere  exist  € -optimal  stationary  do  tern,  inis  tic- 

rules  for  the  replacement  problem. 

(ii)  If  R  is  optimal,  among  the  stationary  deterministic 

rui.ee  then  R  is  optimal,  (for  the  replacement  problem). 

Def :  We  say  that  rule  R  is  a  Markov  rule  ii  the  action  it  chooses  at 

time  t  only  depends  on  the  past  history  thru  the  state  at  time  t,  and  t. 

a.e.  Dk(X0,Aq,...  Xt  =  i)  =  l>i>K(t) 

We  say  that  R  is  non-random  Markov  if  it  is  Markov  and  m  -random  . 

Theorem  4 . 3 :  For  the  replacement  model  there  oxists  a  non- random 

Markov  ruj.e  which  is  optimal. 

Proof:  For  each  n,  lot  R  be  a  stationary  deterministic  rule 
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Let  R  ■  non -random  Markov  be  as  follows 

use  for  t  =  1,  . . .  then  take  action  aQ 

use  Rg  for  the  next  N2  stages  then  take  aQ 

•  • 

•  • 

use  R.^  for  the  next  stages  then  take  a 
etc . 

Claim:  cp(R)  =  g 

for  any  e  >  0,  let  j  be  such  that  l/j  <  e 

f 

We  shall  show  that  n  >  N.,  +  ...  N.  +  j  =>  - <  g  +  e 

1  J  n 
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now  if  n  >  N1  +  ...  +  Nj  +  J  then  for  some  k  >  J 

Ni  +  . . .  +  Nk  +  k  '  fi  +  • • •  +  Nk  +  Nk+1  +  k  +  1 


now 


N,. 


n-EN.-k 

1 


(N 


1  +  +  Nk-1+  k>M  +  \  J  C(Xt-lAt-l)  +  \+1  X  C(Xt-l'At-l> 


there  are  2  cases 


Case  (i):  n  ■  -  k  <  Nk+1 


Case  (ii):  n  -  2  ^  -  k  >  Nk+1 


n 


If  (i)  then 


*8  l 

n 


8,. 


<  V  •  ■ • +  "k-i+  k)M  +  \  l  c(xt-i'At-i)  +  Hk+1 M 


N,  +  ...  +  N.  .  +  k  +  N,  .  +  N, 
1  k-1  k+1  k 


<  *01* )  +  i*<8  +  ik  +  ik  *  s  +  iA<g+£ 
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nffcWJW  UHlWW 


It  (11)  then  - - - 

n 


K“1  k 

(  E  N1+  k)M  +  ER^IC(Xt_1,At_1)  +  (n  -  E  N^ 


n 


k  )^cP(Rk+l)  +  2(k+l )  ^ 


N, 


k) 


k 

(n  -  E 
1 


fii  -  k><*<W  *  2^n  > 


<  g  +  r 
e  k 

<  g  +  € 

•  ’  •  <P(R)  -  g  QED 


Thus  in  the  replacement  problem  there  always  exists  an  optimal 
non-randoraized  rule.  That  this  rule  cannot  always  be  taken  to  be 
stationary  is  shown  by  the  following  example. 

Example  4.4:  Let  I  =  {0,1,2,...} 

=  J  i  =  0,1,... 

The  costs  are  as  follows : 

C(i,0)  =  1  for  all  i 

C(i,l)  =  1  for  all  i 

C(i,2)  =  1/i+l  for  all  i 
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t. 


transition  probabilities  are  as  follows 
P(i,0:0)  =  1 

P(i,i+l:l)  =  1 

P(i,i:2)  =  1 

In  words  when  in  state  i  we  can  choose  to  (l)  remain  in  state  i  at 
the  cost  of  l/i+1  units,  or  (2)  go  to  state  i+1  at  the  cost  of  1  unit, 
or  (3)  return  to  state  0  at  the  cost  of  1  unit. 

[Actually  the  replacement  action  is  superfluous  in  the  sense  that 

action  1  is  always  a  better  action). 

Let  R  be  any  stationary  deterministic  rule 

Let  i(R)  be  the  action  R  chooses  when  in  state  i 

Let  R1  =  min  (i:i(R)  1} 

Then  it  is  easy  to  see  that 

R  <  oo  =>  Cp(  R )  >  i-  >  0 

1  _  R1 

R1  =•  *  =>  <p(F.)  *  1  >0 

. ' .  R  stationary  deterministic  =>  <p(R)  >  0 

Let  the  non-random  Markov  rule  R*  be  as  follows: 

When  it  first  enters  state  i,  i  «  0,1,...  R*  chooses  action  2 
i  times  and  then  it  chooses  action  1 
It  is  easy  to  see  that  cp(R*)  =  0. 

It  is  also  interesting  to  note  that  the  stationary  (but  non- 
deterministic )  rule  R**  is  also  optimal,  i.e.  cp(R*^)  =  0 
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where  R**  is  the  rule  which  when  in  state  i  selects  action  2  with 


probability  i/i+1  and  action  1  with  probability  1/i+l.  QED 

We  defined,  for  the  replacement  problem,  the  average  cost  in  terms 
of  the  lim  sup  as  opposed  to  the  lim  inf.  The  question  arises  whether 
or  not  this  is  a  meaningful  difference.  We  show  that  it  is  not  and 
both  criteria  are  in  a  sense  alike. 

Let  £  =  inf  g>(R)  and  g  =  inf  <p(R) 

R  R 

Sr  Y  C(Xt'A‘) 

where  gj(R)  =  lim  inf  - 

n  n 

er  Y  C(Xt'At) 

9(r)  =  lim  sup  - 

n  n 

Theorem  4 .  :  For  the  replacement  problem 

g.  =  g  =  S 

Proof :  Choose  e  >  0 

Let  R  be  such  that  £(R)  <  £  +  e/2 

N-l 

Er  E  C(Xt,At)  +  M 

Choose  N  such  that  - - -  <  c^(R)  +  e/2 

N  +  1 

I 

Define  R  as  follows 

I 

R  follows  (takes  the  same  actions  as)  R  at  times  0,...  N-l 

t 

R  takes  action  at  time  N 
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Thus  the  process  Is  now  in  state  0  and  we  consider  it  as  starting  all 
over  again  -  i.e.  we  forget  that  the  history  up  to  this  time  has  ever 

f 

taken  place.  R  now  follows  R  for  the  next  N  stages,  then  takes 
Bq  then  follows  R  (pretending  the  previous  history  never  took  place) 
for  the  next  N  stages,  then  takes  a^,  etc. 

Then  it  is  easy  to  see  that 


9(R  )  =  £(R  )  = 


N  +  1 


<p(R  )  <  £(R)  +  C/2  <  g  +  £ 


g  <  g 


g  since  by  definition  g  >  g 


QED 


Corollary  4 . 5 :  (i)  There  exist  e -optimal  stationary  deterministic 

rules  with  respect  to  the  lim  inf  criteria. 

(ii)  The  lim  sup  optimal  non-randomized  Markov  rule  R 
of  Theorem  k.3  has  g>(R)  =  g. 

Proof:  (i)  ^(R)  <  <p( R )  and  so  result  follows  from  Corollary  k.2  and 

the  above  theorem. 

(ii)  g  <  gj(R)  <  cp(R)  =  g  qj£D 
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5.  Counterexample 


In  4.4  an  example  is  given  of  a  process  for  which  there  is  an  optimal 
stationary  rule  but  not  an  optimal  stationary  deterministic  rule.  This 
brings  up  the  question  first  raised  by  Derman  [4]  of  whether  one  need  ever 
go  outside  the  class  of  stationary  rules.  The  following  is  an  example  for 
which  there  is  an  optimal  nonstationary  rule  which  is  better  than  every 
stationary  rule.  The  optimal  rule  may  be  taken  to  be  a  non-random  Markov 
rule. 

Example  5.1;  Let  I  =  (0,1,1' ,2,2' ,3,3 V ) 

KQ  =  1,  =  2  i  =  1,2, ... ,  K1(  »  1  i'  =  1 ' ,2 '  ... 


The  cost  depends  only  on  the  state  and  is  zero  except  at 
state  0. 

C(0, ■ )  =  1 

C(i, ' )  =0  i  -  1,2,... 

C(i',-)  =0  i'  =  1 ' ,2 ' , . . . 


The  transition  probabilities  are  as  follows; 
P(i,i+l:l)  =  1/2  i  =  0,1, ..  . 

P(i,C;l)  =1/2  i  =  0,1,. . . 

P(i,i':2)  =  1  -  (1/2)1  i  =  1,2,... 

P(i,0:2)  =  (1/2)1  i  =  1,2,... 

P(i',i':l)  =  1  -  (1/2)1  i'  =  1 ' ,2 ' , . . . 

P(i ' ,0  :l)  =  (1/2)1  i'  =  1 ' ,2 ' , . . . 


Suppose  that  XQ  =  0 


Let  R.^  be  the  stationary  deterministic  rule  which  takes  action  2  when 
in  state  i  and  action  1  in  all  other  states. 
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as  the  time  it  takes  to  return  to  state  0. 
that 

=  i/er  (t) 

Ri 

1  1  i  i 

=  E  ,3(1/2  )J  +  (1/2)  (i  +  2  ) 

J=1 

=  3  -  (1/2)1'1  <  3 

1  L 

=  - - -  i-i  >  -  i  =  1.2,... 

3  -  (1/2)  ^ 


now  let  R  be  any  stationary  rule.  Let  be  the  probability  that  R 
chooses  action  2  when  in  state  i, 


now 

9(0, R) 

=  l/Ep^) 

and 

er(t) 

00  i  -  1  00 

=  £  (  n  (1  -  pj)  PA  E  (T)  +  2  n  (1  -  P.  ) 

i=l  j=i  J  i  i=l 

<  3[  £  P.j  n  (1  -  P,)+  ir  (1  -  Pj]  =  3 
i=l  j=l  J  i=l 

9(0, R)  >  j  for  any  stationary  rule  R. 

But  if  we  define  i  =  1,2,...  as  in  theorem  4.3  we  can  let  R*  be 


the  nonrandom  Markov  rule  which  uses  R^ 

for 

t  ■  1,  ...  1^ 

R2 

for 

t  =  N  +  1,  .  . . 

i-1 

V 

i 

N_2 

r. 

for 

t  =  Z  N4+  1,  . 

•  •  La 

N. 

.  i 

etc , 

1  J 

1 

J 

Define  a  cycle  T 
Then  it  is  well  known 
Cp(0,R1) 


now 


ER  (T) 
Ri 


9(0, R.) 


-  ‘‘**1  r 

r'srri  .  t 


i 


t 

I 

! 
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Then  we  can  show  as  before  that 


<p(0,R*)  =  11m  <p(0,RA)  =  1/3 

1 

Cp(0,R*)  <  <p(0, R)  for  all  stationary  R. 


It  also  follows  by  Theorem  3«1  that  R*  is  optimal. 


•w*  *»s*f  ft 
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